LREC 2012 Workshop on Language Resource Merging

نویسندگان

  • Riccardo Del Gratta
  • Francesca Frontini
  • Monica Monachini
  • Valeria Quochi
  • Cristina Vertan
  • Svetla Koeva
  • Adam Przepiórkowski
  • Maciej Ogrodniczuk
  • Dan Cristea
  • Eugen Ignat
  • Laura Rimell
  • Thierry Poibeau
چکیده

The talk will present UBY, a large-scale resource integration project based on the Lexical Markup Framework (LMF, ISO 24613:2008). Currently, nine lexicons in two languages (English and German) have been integrated: WordNet, GermaNet, FrameNet, VerbNet, Wikipedia (DE/EN), Wiktionary (DE/EN), and OmegaWiki. All resources have been mapped to the LMF-based model and imported into an SQL-DB. The UBY-API, a common Java software library, provides access to all data in the database. The nine lexicons are densely interlinked using monolingual and cross-lingual sense alignments. These sense alignments yield enriched sense representations and increased coverage. A sense alignment framework has been developed for automatically aligning any pair of resources monoor cross-lingually. As an example, the talk will report on the automatic alignment of WordNet and Wiktionary. Further information on UBY and UBY-API is available at: http://www.ukp.tu-darmstadt.de/data/lexical-resources/uby/.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards an International Standard on Feature Structure Representation

This paper describes the preliminary results of a joint initiative of the TEI (Text Encoding Initiative) Consortium and the ISO Committee TC 37SC 4 (Language Resource management) to provide a standard for the representation and interchange of feature structures. The paper published in the proceedings of this workshop is in fact an extension of a paper published in the LREC 2004 proceedings, and...

متن کامل

Ninth Workshop on Building and Using Comparable Corpora Workshop Programme

Comparable corpora are the most versatile and valuable resource for multilingual Natural Language Processing. The speaker will argue that comparable corpora can support a wider range of applications than has been demonstrated so far in the state of the art. The talk will present completed and ongoing work conducted by the speaker and colleagues from his research group where comparable corpora a...

متن کامل

Resources and Techniques for Multilingual Information Extraction

Official travel warnings published regularly in the internet by the ministries for foreign affairs of France, Germany, and the UK provide a useful resource for assessing the risks associated with travelling to some countries. The shallow IE system SProUT has been extended to meet the specific needs of delivering a language-neutral output for English, French, or German input texts. A shared type...

متن کامل

Building a Basque-Chinese Dictionary by Using English as Pivot

Bilingual dictionaries are key resources in several fields such as translation, language learning or various NLP tasks. However, only major languages have such resources. Automatically built dictionaries by using pivot languages could be a useful resource in these circumstances. Pivot-based bilingual dictionary building is based on merging two bilingual dictionaries which share a common languag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012